- Title
- Identifying impactful service system problems via log analysis
- Creator
- He, Shilin; Lin, Qingwei; Lou, Jian-Guang; Zhang, Hongyu; Lyu, Michael R.; Zhang, Dongmei
- Relation
- ESEC/FSE 2018: The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL 4-9 November, 2018) p. 60-70
- Publisher Link
- http://dx.doi.org/10.1145/3236024.3236083
- Publisher
- Association for Computing Machinery
- Resource Type
- conference paper
- Date
- 2018
- Description
- Logs are often used for troubleshooting in large-scale software systems. For a cloud-based online system that provides 24/7 service, a huge number of logs could be generated every day. However, these logs are highly imbalanced in general, because most logs indicate normal system operations, and only a small percentage of logs reveal impactful problems. Problems that lead to the decline of system KPIs (Key Performance Indicators) are impactful and should be fixed by engineers with a high priority. Furthermore, there are various types of system problems, which are hard to be distinguished manually. In this paper, we propose Log3C, a novel clustering-based approach to promptly and precisely identify impactful system problems, by utilizing both log sequences (a sequence of log events) and system KPIs. More specifically, we design a novel cascading clustering algorithm, which can greatly save the clustering time while keeping high accuracy by iteratively sampling, clustering, and matching log sequences. We then identify the impactful problems by correlating the clusters of log sequences with system KPIs. Log3C is evaluated on real-world log data collected from an online service system at Microsoft, and the results confirm its effectiveness and efficiency. Furthermore, our approach has been successfully applied in industrial practice.
- Subject
- log analysis; problem identification; clustering; service systems
- Identifier
- http://hdl.handle.net/1959.13/1409786
- Identifier
- uon:36059
- Identifier
- ISBN:9781450355735
- Language
- eng
- Reviewed
- Hits: 1937
- Visitors: 1936
- Downloads: 1
Thumbnail | File | Description | Size | Format |
---|